Linear-Space Computation of the Edit-Distance between a String and a Finite Automaton

نویسندگان

  • Cyril Allauzen
  • Mehryar Mohri
چکیده

The problem of computing the edit-distance between a string and a finite automaton arises in a variety of applications in computational biology, text processing, and speech recognition. This paper presents linear-space algorithms for computing the edit-distance between a string and an arbitrary weighted automaton over the tropical semiring, or an unambiguous weighted automaton over an arbitrary semiring. It also gives an efficient linear-space algorithm for finding an optimal alignment of a string and such a weighted automaton.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nondeterministic Finite Automata in Hardware - the Case of the Levenshtein Automaton

The Levenshtein Nondeterministic Finite state Automaton (NFA) recognizes input strings within a set edit distance of a configured pattern in linear time. This automaton can be pipelined to recognize all substrings of an input text in linear time with additional use of nondeterminism. In general, von Neumann hardware cannot directly execute NFAs without significant time or space overhead. A von ...

متن کامل

Learning state machine-based string edit kernels

During the past few years, several works have been done to derive string kernels from probability distributions. For instance, the Fisher kernel uses a generative model M (e.g. a hidden markov model) and compares two strings according to how they are generated by M . On the other hand, the marginalized kernels allow the computation of the joint similarity between two instances by summing condit...

متن کامل

A fast algorithm for finding the nearest neighbor of a word in a dictionary

In this paper a new algorithm for string edit distance computation is proposed. It is based on the classical approach [11]. However, while in [11] the two strings to be compared may be given online, our algorithm assumes that one of the two strings to be compared is a dictionary entry that is known a priori. This dictionary word is converted, in an o -line phase to be carried out beforehand, in...

متن کامل

Efficient Similarity Search in Very Large String Sets

String similarity search is required by many real-life applications, such as spell checking, data cleansing, fuzzy keyword search, or comparison of DNA sequences. Given a very large string set and a query string, the string similarity search problem is to efficiently find all strings in the string set that are similar to the query string. Similarity is defined using a similarity (or distance) m...

متن کامل

Edit Distance for Pushdown Automata

The edit distance between two words w1, w2 is the minimal number of word operations (letter insertions, deletions, and substitutions) necessary to transform w1 to w2. The edit distance generalizes to languages L1,L2, where the edit distance is the minimal number k such that for every word from L1 there exists a word in L2 with edit distance at most k. We study the edit distance computation prob...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/0904.4686  شماره 

صفحات  -

تاریخ انتشار 2008